Tool Use from Python

Before reading any explanation, predict what happens when you run this code:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
            },
            "required": ["city"],
        },
    }
]

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What is the weather in Paris right now?"}],
)

print(response.stop_reason)
print(type(response.content[0]))

Write your prediction. Then continue.

# Output:
# tool_use
# <class 'anthropic.types.tool_use_block.ToolUseBlock'>

The model did not answer the question. It stopped and asked Python to run a tool. stop_reason is "tool_use", not "end_turn". The content block is not text -- it is a structured tool call. The model's response is a request, not an answer.

This is the fundamental shift in agentic AI engineering: you are no longer calling an API and reading a response. You are running a loop. The model asks, your code executes, you report back, the model continues. Your Python code is the execution environment for the model's plan.

This lesson teaches you to build that loop correctly.

What You Will Learn

The tool use protocol: how the model requests a tool call and how you respond
Defining tools with JSON schemas, Pydantic models, and decorated Python functions
The agentic loop: structure, stopping conditions, and safety limits
Anthropic tool use API: tool_choice, ToolResultBlock, multi-turn with tools
OpenAI function calling: parallel calls, tool_choice, required vs auto
Building a ToolRegistry class that auto-generates schemas from type hints and docstrings
Error handling: what to send back when a tool raises an exception
Timeout and iteration limits to prevent runaway agents
Parallel tool execution when the model requests multiple tools simultaneously
Real tool implementations: web search, code execution, database queries, file I/O
Testing tool-using agents

Prerequisites

Familiarity with the Anthropic and OpenAI Python SDKs (Lessons 1-2)
Python type hints and inspect module basics
Dataclasses and TypedDict

Part 1 -- How Tool Use Works

The flow is a multi-turn protocol, not a single API call:

The model never executes tools directly. It generates a structured request saying "please run tool X with arguments Y and tell me what you get." Your Python code runs the tool, then sends the result back in the next API call. The model then generates either the final answer or another tool call.

Why This Design?

The model has no internet access, no filesystem access, no ability to run code. It is a text-in, text-out function. Tool use is the mechanism by which it extends its capabilities into the real world -- through your code as the intermediary. You control which tools exist, what they can do, and what information flows back to the model. This is both the power and the responsibility of tool use engineering.

Part 2 -- Defining Tools

Raw JSON Schema

The most explicit form -- useful when you need precise control or are generating schemas programmatically:

# Tools are defined as a list of dicts matching the JSON Schema spec
# The 'input_schema' field describes the arguments the tool expects

weather_tool = {
    "name": "get_weather",
    "description": (
        "Retrieve current weather conditions for a specified city. "
        "Returns temperature in Celsius, a weather description, and humidity. "
        "Use this when the user asks about current weather."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "City name, e.g. 'London' or 'New York'",
            },
            "units": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature units. Defaults to celsius.",
            },
        },
        "required": ["city"],   # 'units' is optional -- omitted from required list
    },
}

search_tool = {
    "name": "web_search",
    "description": (
        "Search the web for current information. Use when the answer requires "
        "recent information not in your training data. Returns a list of search "
        "result snippets with titles and URLs."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query string.",
            },
            "num_results": {
                "type": "integer",
                "description": "Number of results to return. Default 5, max 10.",
                "minimum": 1,
                "maximum": 10,
            },
        },
        "required": ["query"],
    },
}

Pydantic-Based Tool Schemas

Pydantic models generate JSON schemas automatically, eliminating the need to write them by hand. This is the recommended approach for production systems:

from pydantic import BaseModel, Field
import json


class GetWeatherInput(BaseModel):
    """Input schema for the get_weather tool."""
    city: str = Field(description="City name, e.g. 'London' or 'New York'")
    units: str = Field(
        default="celsius",
        description="Temperature units: 'celsius' or 'fahrenheit'",
        pattern="^(celsius|fahrenheit)$",
    )


class WebSearchInput(BaseModel):
    query: str = Field(description="The search query string.")
    num_results: int = Field(
        default=5,
        ge=1,
        le=10,
        description="Number of results to return. Default 5, max 10.",
    )


def pydantic_to_anthropic_tool(
    name: str,
    description: str,
    model: type[BaseModel],
) -> dict:
    """Convert a Pydantic model into an Anthropic tool definition.

    The Pydantic model provides the input_schema automatically.
    Field descriptions populate the JSON schema 'description' fields.
    """
    schema = model.model_json_schema()

    # Remove Pydantic-specific fields that Anthropic does not expect
    schema.pop("title", None)

    return {
        "name": name,
        "description": description,
        "input_schema": schema,
    }


# Generate the tool definitions
weather_tool = pydantic_to_anthropic_tool(
    "get_weather",
    "Get current weather conditions for a city.",
    GetWeatherInput,
)

# Verify the generated schema
print(json.dumps(weather_tool, indent=2))

Part 3 -- The Agentic Loop

The agentic loop is the core pattern for tool-using applications. It runs until the model produces a final answer or a safety limit is hit.

import anthropic
from anthropic.types import ToolUseBlock, TextBlock, Message

client = anthropic.Anthropic()


def run_agent(
    user_message: str,
    tools: list[dict],
    tool_implementations: dict[str, callable],
    system_prompt: str = "",
    model: str = "claude-opus-4-6",
    max_iterations: int = 10,
) -> str:
    """Run an LLM agent with tool use until it produces a final answer.

    Args:
        user_message: The user's initial request.
        tools: List of tool definitions (Anthropic format).
        tool_implementations: Dict mapping tool name to Python callable.
        system_prompt: Optional system-level instructions.
        model: The model to use.
        max_iterations: Safety limit on the number of tool call rounds.

    Returns:
        The model's final text response.

    Raises:
        MaxIterationsError: If max_iterations is reached without a final answer.
    """
    messages = [{"role": "user", "content": user_message}]

    for iteration in range(max_iterations):
        # Call the model
        response = client.messages.create(
            model=model,
            max_tokens=4_096,
            system=system_prompt,
            tools=tools,
            messages=messages,
        )

        # If the model is done, extract and return the final text
        if response.stop_reason == "end_turn":
            # Find the last TextBlock in the response
            for block in response.content:
                if isinstance(block, TextBlock):
                    return block.text
            return ""  # No text block (shouldn't happen, but be defensive)

        # The model wants to use tools
        if response.stop_reason == "tool_use":
            # Append the model's response (including tool call requests) to history
            # IMPORTANT: you must include the model's tool_use blocks in the history
            # before adding tool results, or the API will reject the next call
            messages.append({"role": "assistant", "content": response.content})

            # Execute each tool the model requested
            tool_results = []
            for block in response.content:
                if not isinstance(block, ToolUseBlock):
                    continue

                result = execute_tool_safely(
                    tool_name=block.name,
                    tool_input=block.input,
                    implementations=tool_implementations,
                )

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,  # Must match the ToolUseBlock's id
                    "content": result["content"],
                    "is_error": result["is_error"],
                })

            # Send tool results back to the model
            messages.append({"role": "user", "content": tool_results})
            continue

        # Unexpected stop reason -- treat as done
        break

    raise MaxIterationsError(
        f"Agent did not produce a final answer within {max_iterations} iterations. "
        f"Last stop_reason: {response.stop_reason}"
    )


class MaxIterationsError(RuntimeError):
    """Raised when an agent loop exceeds the iteration limit."""
    pass

Safe Tool Execution

import traceback
from typing import Any


def execute_tool_safely(
    tool_name: str,
    tool_input: dict,
    implementations: dict[str, callable],
) -> dict[str, Any]:
    """Execute a tool and return a result dict that is always safe to send back.

    Key design principle: the tool execution layer NEVER crashes the agent loop.
    If a tool fails, we tell the model what went wrong and let it decide how
    to proceed (retry, use a different tool, report the error to the user).

    Returns:
        {
            "content": str | list,  # The result text, or error description
            "is_error": bool,       # True if the tool raised an exception
        }
    """
    if tool_name not in implementations:
        return {
            "content": (
                f"Tool '{tool_name}' is not available. "
                f"Available tools: {list(implementations.keys())}"
            ),
            "is_error": True,
        }

    try:
        result = implementations[tool_name](**tool_input)

        # Normalise result to string if it is not already
        if isinstance(result, str):
            content = result
        elif isinstance(result, (dict, list)):
            content = json.dumps(result, indent=2, default=str)
        else:
            content = str(result)

        return {"content": content, "is_error": False}

    except TypeError as e:
        # Wrong arguments -- this usually means the schema is wrong
        return {
            "content": (
                f"Tool '{tool_name}' received unexpected arguments: {e}. "
                f"Arguments provided: {tool_input}"
            ),
            "is_error": True,
        }
    except Exception as e:
        # Any other failure -- give the model enough context to react
        return {
            "content": (
                f"Tool '{tool_name}' raised {type(e).__name__}: {e}\n"
                f"Arguments: {tool_input}"
            ),
            "is_error": True,
        }

Part 4 -- Anthropic Tool Use: Full Protocol

import anthropic
from anthropic.types import (
    ToolUseBlock,
    TextBlock,
    ToolResultBlockParam,
)

client = anthropic.Anthropic()


def demonstrate_anthropic_tool_protocol() -> None:
    """Show the raw multi-turn Anthropic tool use protocol step by step."""

    # ----- Turn 1: Model requests a tool -----
    response1 = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1_024,
        tools=[weather_tool],
        # tool_choice controls whether and how the model uses tools:
        # {"type": "auto"}     -- model decides (default)
        # {"type": "any"}      -- model must use at least one tool
        # {"type": "tool", "name": "get_weather"}  -- must use this specific tool
        tool_choice={"type": "auto"},
        messages=[
            {"role": "user", "content": "What is the weather in Tokyo right now?"}
        ],
    )

    print(f"Stop reason: {response1.stop_reason}")  # tool_use
    tool_block = response1.content[0]
    assert isinstance(tool_block, ToolUseBlock)
    print(f"Tool requested: {tool_block.name}")     # get_weather
    print(f"Tool ID:        {tool_block.id}")        # toolu_01Abc...
    print(f"Tool input:     {tool_block.input}")     # {'city': 'Tokyo', 'units': 'celsius'}

    # ----- Execute the tool in Python -----
    tool_result_text = json.dumps({
        "city": "Tokyo",
        "temperature_c": 22,
        "description": "Partly cloudy",
        "humidity_pct": 68,
    })

    # ----- Turn 2: Send tool result back -----
    # The history must include:
    # 1. The original user message
    # 2. The model's response including the ToolUseBlock (response1.content)
    # 3. A new user message containing the ToolResultBlockParam
    response2 = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1_024,
        tools=[weather_tool],
        messages=[
            {"role": "user", "content": "What is the weather in Tokyo right now?"},
            {"role": "assistant", "content": response1.content},  # Include tool call
            {
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_block.id,  # Must match the tool call ID
                        "content": tool_result_text,
                        # "is_error": True  # Set this if the tool failed
                    }
                ],
            },
        ],
    )

    print(f"\nStop reason: {response2.stop_reason}")  # end_turn
    print(f"Final answer: {response2.content[0].text}")
    # "The current weather in Tokyo is 22 degrees Celsius with partly cloudy skies..."

Handling Multiple Tool Calls in One Turn

The model may request multiple tools in a single response. Process all of them before sending results back:

def handle_multi_tool_response(
    response: anthropic.types.Message,
    implementations: dict[str, callable],
) -> list[dict]:
    """Process all tool calls in a model response.

    The model may request 1 or more tools in a single turn.
    Always collect ALL results before sending back -- never send
    partial results from some tools while others are pending.

    Returns:
        List of ToolResultBlockParam dicts ready to send as the next user message.
    """
    tool_results = []

    for block in response.content:
        if not isinstance(block, ToolUseBlock):
            continue  # Skip TextBlocks (model may include explanation text)

        result = execute_tool_safely(
            tool_name=block.name,
            tool_input=block.input,
            implementations=implementations,
        )

        tool_results.append({
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": result["content"],
            "is_error": result["is_error"],
        })

    return tool_results

Part 5 -- OpenAI Function Calling

OpenAI uses slightly different terminology but the same protocol structure:

import openai
import json

client_oai = openai.OpenAI()


def openai_tool_definition() -> dict:
    """OpenAI uses 'function' type with 'function' sub-object."""
    return {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {                     # 'parameters' not 'input_schema'
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name",
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units",
                    },
                },
                "required": ["city"],
                "additionalProperties": False,   # Strict mode recommended
            },
            "strict": True,  # Structured outputs -- model follows schema exactly
        },
    }


def run_openai_agent(
    user_message: str,
    tools: list[dict],
    tool_implementations: dict[str, callable],
    model: str = "gpt-4o",
    max_iterations: int = 10,
) -> str:
    """OpenAI agentic loop with function calling."""
    messages = [{"role": "user", "content": user_message}]

    for iteration in range(max_iterations):
        response = client_oai.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            # tool_choice options:
            # "auto"       -- model decides (default)
            # "required"   -- model must call at least one tool
            # "none"       -- model must not call any tools
            # {"type": "function", "function": {"name": "..."}}  -- specific tool
            tool_choice="auto",
            parallel_tool_calls=True,   # Allow multiple simultaneous tool calls
        )

        choice = response.choices[0]
        messages.append(choice.message.model_dump())  # Add assistant response to history

        if choice.finish_reason == "stop":
            return choice.message.content or ""

        if choice.finish_reason == "tool_calls":
            # Process all tool calls (may be parallel if parallel_tool_calls=True)
            for tool_call in choice.message.tool_calls:
                try:
                    arguments = json.loads(tool_call.function.arguments)
                except json.JSONDecodeError as e:
                    result_text = f"Failed to parse tool arguments: {e}"
                    is_error = True
                else:
                    result = execute_tool_safely(
                        tool_name=tool_call.function.name,
                        tool_input=arguments,
                        implementations=tool_implementations,
                    )
                    result_text = result["content"]
                    is_error = result["is_error"]

                # OpenAI tool results go as individual 'tool' role messages
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,  # Must match the tool_call's id
                    "content": result_text,
                    # Note: OpenAI does not have an explicit is_error field --
                    # prefix the error message with "ERROR:" as a convention
                    # if is_error: result_text = f"ERROR: {result_text}"
                })
            continue

        break  # Unexpected finish_reason

    raise MaxIterationsError(
        f"OpenAI agent exceeded {max_iterations} iterations."
    )

Parallel Tool Calls

When parallel_tool_calls=True and the model requests multiple tools simultaneously, you should execute them concurrently:

import asyncio
import openai

client_oai_async = openai.AsyncOpenAI()


async def execute_tools_parallel(
    tool_calls: list,
    implementations: dict[str, callable],
) -> list[dict]:
    """Execute multiple tool calls in parallel using asyncio.

    When the model requests 3 tools at once, there is no reason to run them
    sequentially. Execute all of them concurrently and return all results.
    Total time = max(individual_tool_times), not sum(individual_tool_times).
    """

    async def run_one(tool_call) -> dict:
        try:
            arguments = json.loads(tool_call.function.arguments)
        except json.JSONDecodeError as e:
            return {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": f"Failed to parse arguments: {e}",
            }

        # If the tool implementation is async, await it; otherwise run in executor
        impl = implementations.get(tool_call.function.name)
        if impl is None:
            content = f"Unknown tool: {tool_call.function.name}"
        elif asyncio.iscoroutinefunction(impl):
            try:
                result = await impl(**arguments)
                content = json.dumps(result) if not isinstance(result, str) else result
            except Exception as e:
                content = f"Tool error: {type(e).__name__}: {e}"
        else:
            # Run synchronous tools in a thread pool to avoid blocking the event loop
            loop = asyncio.get_event_loop()
            try:
                result = await loop.run_in_executor(None, lambda: impl(**arguments))
                content = json.dumps(result) if not isinstance(result, str) else result
            except Exception as e:
                content = f"Tool error: {type(e).__name__}: {e}"

        return {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": content,
        }

    return await asyncio.gather(*[run_one(tc) for tc in tool_calls])

Part 6 -- Building a ToolRegistry

Rather than maintaining tool schemas and implementations separately (and keeping them in sync), build a registry that derives the schema automatically from the Python function:

import inspect
import json
import re
from typing import Any, Callable, get_type_hints
from functools import wraps


class ToolRegistry:
    """Auto-registers Python functions as LLM tools.

    Derives tool schemas from:
    - Function name (used as tool name)
    - Docstring first paragraph (used as tool description)
    - Type hints (mapped to JSON Schema types)
    - Default values (determines if parameter is required)
    - :param name: doc lines in docstring (parameter descriptions)

    This means you write normal Python functions with proper docstrings and
    the registry handles all the JSON Schema boilerplate. The schema stays
    in sync with the implementation automatically.
    """

    # Python type -> JSON Schema type mapping
    _TYPE_MAP: dict[type, str] = {
        str:   "string",
        int:   "integer",
        float: "number",
        bool:  "boolean",
        list:  "array",
        dict:  "object",
    }

    def __init__(self):
        self._tools: dict[str, dict] = {}             # name -> Anthropic tool definition
        self._implementations: dict[str, Callable] = {}  # name -> Python function

    def tool(
        self,
        name: str | None = None,
        description: str | None = None,
    ) -> Callable:
        """Decorator to register a function as an LLM tool.

        Usage:
            @registry.tool()
            def get_weather(city: str, units: str = "celsius") -> str:
                '''Get current weather for a city.

                :param city: City name, e.g. 'London'
                :param units: Temperature units: 'celsius' or 'fahrenheit'
                '''
                ...
        """
        def decorator(func: Callable) -> Callable:
            tool_name = name or func.__name__
            tool_desc = description or self._extract_description(func)
            schema = self._build_schema(func)

            self._tools[tool_name] = {
                "name": tool_name,
                "description": tool_desc,
                "input_schema": schema,
            }
            self._implementations[tool_name] = func

            @wraps(func)
            def wrapper(*args, **kwargs):
                return func(*args, **kwargs)

            wrapper._tool_name = tool_name
            return wrapper

        return decorator

    def _extract_description(self, func: Callable) -> str:
        """Extract the first paragraph of the docstring as the description."""
        doc = inspect.getdoc(func) or ""
        # First paragraph is everything before the first blank line or :param
        lines = []
        for line in doc.splitlines():
            if line.startswith(":param") or line.startswith(":return"):
                break
            if not line and lines:
                break
            if line:
                lines.append(line)
        return " ".join(lines) if lines else f"Execute the {func.__name__} operation."

    def _extract_param_docs(self, func: Callable) -> dict[str, str]:
        """Extract :param name: description lines from the docstring."""
        doc = inspect.getdoc(func) or ""
        param_docs: dict[str, str] = {}
        for match in re.finditer(r":param\s+(\w+):\s*(.+)", doc):
            param_docs[match.group(1)] = match.group(2).strip()
        return param_docs

    def _build_schema(self, func: Callable) -> dict:
        """Build a JSON Schema input_schema from the function signature."""
        sig = inspect.signature(func)
        hints = get_type_hints(func)
        param_docs = self._extract_param_docs(func)

        properties: dict[str, dict] = {}
        required: list[str] = []

        for param_name, param in sig.parameters.items():
            if param_name in ("self", "cls"):
                continue

            # Determine JSON Schema type from type hint
            hint = hints.get(param_name, str)
            # Unwrap Optional[X] to X
            origin = getattr(hint, "__origin__", None)
            if origin is type(None):
                hint = str  # Fallback
            elif hasattr(hint, "__args__"):
                # Union[X, None] = Optional[X]
                args = [a for a in hint.__args__ if a is not type(None)]
                hint = args[0] if args else str

            json_type = self._TYPE_MAP.get(hint, "string")

            prop: dict[str, Any] = {"type": json_type}
            if param_name in param_docs:
                prop["description"] = param_docs[param_name]

            properties[param_name] = prop

            # If no default value, the parameter is required
            if param.default is inspect.Parameter.empty:
                required.append(param_name)

        return {
            "type": "object",
            "properties": properties,
            "required": required,
        }

    def get_tools(self) -> list[dict]:
        """Return the list of tool definitions for the Anthropic API."""
        return list(self._tools.values())

    def get_openai_tools(self) -> list[dict]:
        """Return tool definitions in OpenAI format."""
        openai_tools = []
        for tool in self._tools.values():
            openai_tools.append({
                "type": "function",
                "function": {
                    "name": tool["name"],
                    "description": tool["description"],
                    "parameters": tool["input_schema"],
                },
            })
        return openai_tools

    def get_implementations(self) -> dict[str, Callable]:
        """Return the tool implementation dict for the agent loop."""
        return dict(self._implementations)

    def __contains__(self, name: str) -> bool:
        return name in self._tools

    def __repr__(self) -> str:
        return f"ToolRegistry({list(self._tools.keys())})"


# Example: registering tools with the registry
registry = ToolRegistry()


@registry.tool()
def get_weather(city: str, units: str = "celsius") -> str:
    """Get current weather conditions for a city.

    :param city: City name, e.g. 'London' or 'New York'
    :param units: Temperature units: 'celsius' or 'fahrenheit'
    """
    # In production: call a real weather API
    weather_data = {
        "london": {"temp_c": 15, "description": "Overcast", "humidity": 82},
        "new york": {"temp_c": 22, "description": "Sunny", "humidity": 45},
        "tokyo": {"temp_c": 28, "description": "Partly cloudy", "humidity": 70},
    }
    data = weather_data.get(city.lower(), {"temp_c": 20, "description": "Unknown", "humidity": 50})

    temp = data["temp_c"]
    if units == "fahrenheit":
        temp = round(temp * 9/5 + 32, 1)
        unit_str = "F"
    else:
        unit_str = "C"

    return json.dumps({
        "city": city,
        "temperature": f"{temp}{unit_str}",
        "description": data["description"],
        "humidity_pct": data["humidity"],
    })


@registry.tool()
def web_search(query: str, num_results: int = 5) -> str:
    """Search the web for current information.

    :param query: Search query string.
    :param num_results: Number of results to return (1-10).
    """
    # In production: call a real search API (Brave, Bing, SerpAPI, etc.)
    # This is a stub that returns plausible fake results for testing
    return json.dumps([
        {
            "title": f"Result {i+1} for: {query}",
            "snippet": f"This is result {i+1}. Relevant content about {query}.",
            "url": f"https://example.com/result/{i+1}",
        }
        for i in range(min(num_results, 5))
    ])


# Inspect the auto-generated schema
print(json.dumps(registry.get_tools()[0], indent=2))

Part 7 -- Real Tool Implementations

Code Execution Tool

import subprocess
import tempfile
import os
from pathlib import Path


@registry.tool()
def execute_python(code: str, timeout_seconds: int = 10) -> str:
    """Execute Python code in a sandboxed subprocess and return the output.

    Use this to run calculations, data transformations, or verify logic.
    Do not use this to run code that has side effects on production systems.

    :param code: Python code to execute.
    :param timeout_seconds: Maximum execution time in seconds (max 30).
    """
    timeout_seconds = min(timeout_seconds, 30)  # Hard cap -- model cannot override

    # Write code to a temp file rather than passing it via -c to avoid
    # shell injection and command-line length limits
    with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
        f.write(code)
        tmp_path = f.name

    try:
        result = subprocess.run(
            ["python3", tmp_path],
            capture_output=True,
            text=True,
            timeout=timeout_seconds,
            # Restrict environment: no access to production env vars
            env={
                "PATH": "/usr/bin:/bin",
                "HOME": "/tmp",
                "PYTHONPATH": "",
            },
        )

        output_parts = []
        if result.stdout.strip():
            output_parts.append(f"stdout:\n{result.stdout.strip()}")
        if result.stderr.strip():
            output_parts.append(f"stderr:\n{result.stderr.strip()}")
        if result.returncode != 0:
            output_parts.append(f"exit_code: {result.returncode}")

        return "\n\n".join(output_parts) or "(no output)"

    except subprocess.TimeoutExpired:
        return f"Execution timed out after {timeout_seconds} seconds."
    finally:
        os.unlink(tmp_path)  # Always clean up the temp file

Database Query Tool

import sqlite3
from typing import Any


class DatabaseQueryTool:
    """Wraps a database connection for use as an LLM tool.

    Demonstrates the pattern of tools that hold state (a connection pool
    or connection object) while exposing a stateless interface to the model.
    """

    def __init__(self, db_path: str, read_only: bool = True):
        """
        Args:
            db_path: Path to the SQLite database file.
            read_only: If True, only SELECT queries are allowed.
                       Set False only if you trust the model completely.
        """
        self._db_path = db_path
        self._read_only = read_only

    def query(self, sql: str, limit: int = 50) -> str:
        """Execute a SQL query and return results as JSON.

        :param sql: SQL query to execute. Only SELECT queries are allowed.
        :param limit: Maximum number of rows to return (max 100).
        """
        limit = min(limit, 100)  # Hard cap

        # Reject non-SELECT queries if read_only mode is enabled
        if self._read_only:
            sql_stripped = sql.strip().upper()
            if not sql_stripped.startswith("SELECT"):
                return json.dumps({
                    "error": "Only SELECT queries are permitted in read-only mode.",
                    "rejected_query": sql[:100],
                })

        try:
            conn = sqlite3.connect(self._db_path)
            conn.row_factory = sqlite3.Row  # Rows act like dicts
            cursor = conn.execute(f"{sql} LIMIT {limit}")

            columns = [desc[0] for desc in cursor.description]
            rows = [dict(zip(columns, row)) for row in cursor.fetchall()]

            return json.dumps({
                "columns": columns,
                "row_count": len(rows),
                "rows": rows,
            }, default=str)

        except sqlite3.Error as e:
            return json.dumps({"error": f"SQL error: {e}", "query": sql[:200]})
        finally:
            conn.close()


# Register the database tool on an instance
db_tool = DatabaseQueryTool("/data/analytics.db", read_only=True)

# Manually add to registry (for instance methods that cannot be decorated directly)
registry._implementations["query_database"] = db_tool.query
registry._tools["query_database"] = {
    "name": "query_database",
    "description": (
        "Execute a read-only SQL SELECT query against the analytics database. "
        "Use this to retrieve data about users, events, and metrics."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "sql": {"type": "string", "description": "SQL SELECT query to execute."},
            "limit": {
                "type": "integer",
                "description": "Max rows to return. Default 50, max 100.",
                "minimum": 1,
                "maximum": 100,
            },
        },
        "required": ["sql"],
    },
}

File Operations Tool

from pathlib import Path


class FileReadTool:
    """Read-only file access tool for LLM agents.

    Restricts access to a whitelist of allowed directories to prevent
    the model from reading sensitive system files.
    """

    def __init__(self, allowed_dirs: list[str]):
        self._allowed = [Path(d).resolve() for d in allowed_dirs]

    def read_file(self, path: str) -> str:
        """Read the contents of a text file.

        :param path: File path relative to the allowed directory.
        """
        resolved = Path(path).resolve()

        # Security check: ensure the path is within an allowed directory
        if not any(
            str(resolved).startswith(str(allowed_dir))
            for allowed_dir in self._allowed
        ):
            return json.dumps({
                "error": f"Access denied: {path} is outside allowed directories.",
                "allowed_directories": [str(d) for d in self._allowed],
            })

        if not resolved.exists():
            return json.dumps({"error": f"File not found: {path}"})

        if not resolved.is_file():
            return json.dumps({"error": f"Path is not a file: {path}"})

        try:
            content = resolved.read_text(encoding="utf-8")
            # Limit content size to avoid filling the context window
            if len(content) > 50_000:
                content = content[:50_000] + "\n\n... [file truncated at 50,000 chars]"
            return json.dumps({"path": str(resolved), "content": content})
        except UnicodeDecodeError:
            return json.dumps({
                "error": f"Cannot read {path}: file is not valid UTF-8 text."
            })

Part 8 -- Timeout and Safety Limits

Production agents need hard limits. Models can enter loops, request tools that are slow, or make more calls than the user is willing to pay for.

import asyncio
import time
from dataclasses import dataclass, field


@dataclass
class AgentLimits:
    """Safety limits for an agentic session."""
    max_iterations: int = 10         # Max tool call rounds
    max_wall_time_seconds: float = 60.0  # Max total elapsed time
    max_tool_calls: int = 25         # Max total individual tool invocations
    max_input_tokens: int = 100_000  # Max cumulative input tokens

    def __post_init__(self):
        self._start_time = time.monotonic()
        self._tool_call_count = 0
        self._total_input_tokens = 0

    @property
    def elapsed(self) -> float:
        return time.monotonic() - self._start_time

    def check(self, iteration: int, new_tool_calls: int = 0, new_tokens: int = 0) -> None:
        """Check all limits. Raises AgentLimitError if any limit is exceeded."""
        self._tool_call_count += new_tool_calls
        self._total_input_tokens += new_tokens

        if iteration >= self.max_iterations:
            raise AgentLimitError(
                f"Exceeded max iterations ({self.max_iterations}). "
                f"This usually means the agent is stuck in a loop. "
                f"Check your tool descriptions -- are they clear enough?"
            )
        if self.elapsed > self.max_wall_time_seconds:
            raise AgentLimitError(
                f"Exceeded max wall time ({self.max_wall_time_seconds}s). "
                f"Elapsed: {self.elapsed:.1f}s. "
                f"Consider adding async timeouts to slow tools."
            )
        if self._tool_call_count > self.max_tool_calls:
            raise AgentLimitError(
                f"Exceeded max tool calls ({self.max_tool_calls}). "
                f"Total so far: {self._tool_call_count}."
            )
        if self._total_input_tokens > self.max_input_tokens:
            raise AgentLimitError(
                f"Exceeded max input tokens ({self.max_input_tokens:,}). "
                f"Total so far: {self._total_input_tokens:,}."
            )


class AgentLimitError(RuntimeError):
    """Raised when an agent exceeds a configured safety limit."""
    pass


def run_agent_with_limits(
    user_message: str,
    tools: list[dict],
    implementations: dict[str, callable],
    limits: AgentLimits | None = None,
    model: str = "claude-opus-4-6",
) -> str:
    """Run the agent loop with safety limits enforced on every iteration."""
    limits = limits or AgentLimits()
    client = anthropic.Anthropic()

    messages = [{"role": "user", "content": user_message}]
    iteration = 0

    while True:
        response = client.messages.create(
            model=model,
            max_tokens=4_096,
            tools=tools,
            messages=messages,
        )

        # Count how many tool calls are in this response
        tool_calls_this_round = sum(
            1 for block in response.content
            if isinstance(block, ToolUseBlock)
        )

        # Check all limits before proceeding
        limits.check(
            iteration=iteration,
            new_tool_calls=tool_calls_this_round,
            new_tokens=response.usage.input_tokens,
        )

        if response.stop_reason == "end_turn":
            for block in response.content:
                if isinstance(block, TextBlock):
                    return block.text
            return ""

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = handle_multi_tool_response(response, implementations)
            messages.append({"role": "user", "content": tool_results})
            iteration += 1
            continue

        break

    return ""

Part 9 -- Testing Tool-Using Agents

Testing agents is different from testing regular functions. The non-determinism of LLM outputs means you test behaviour at the level of: "does the agent use the right tools?", "does it recover from tool errors?", and "does it produce a final answer?".

import pytest
from unittest.mock import MagicMock, patch


class MockToolRegistry:
    """A tool registry with controllable tool implementations for testing."""

    def __init__(self):
        self._calls: list[dict] = []
        self._responses: dict[str, str] = {}

    def set_response(self, tool_name: str, response: str) -> None:
        """Pre-configure what a tool will return."""
        self._responses[tool_name] = response

    def set_error(self, tool_name: str, error: str) -> None:
        """Pre-configure a tool to fail with an error message."""
        self._responses[tool_name] = f"ERROR:{error}"

    def get_implementation(self, tool_name: str) -> callable:
        """Return a mock implementation that records calls and returns pre-configured responses."""
        registry = self

        def mock_tool(**kwargs) -> str:
            registry._calls.append({"tool": tool_name, "args": kwargs})
            response = registry._responses.get(tool_name, '{"result": "default mock response"}')
            if response.startswith("ERROR:"):
                raise RuntimeError(response[6:])
            return response

        return mock_tool

    @property
    def calls(self) -> list[dict]:
        return list(self._calls)

    def was_called(self, tool_name: str) -> bool:
        return any(c["tool"] == tool_name for c in self._calls)

    def call_count(self, tool_name: str) -> int:
        return sum(1 for c in self._calls if c["tool"] == tool_name)


def test_agent_calls_weather_tool():
    """Agent should call the weather tool when asked about weather."""
    mock_registry = MockToolRegistry()
    mock_registry.set_response(
        "get_weather",
        '{"city": "London", "temperature": "15C", "description": "Cloudy"}'
    )

    implementations = {
        "get_weather": mock_registry.get_implementation("get_weather"),
    }

    # NOTE: in unit tests, mock the LLM response too.
    # Only do real API calls in integration tests.
    # Here we show the structure -- in practice use pytest-mock or
    # a real API call with test credentials.
    result = run_agent_with_limits(
        user_message="What is the weather in London?",
        tools=registry.get_tools(),
        implementations=implementations,
    )

    assert mock_registry.was_called("get_weather"), "Agent should have called get_weather"
    assert mock_registry.call_count("get_weather") == 1, "Should only call once"
    assert "london" in result.lower() or "15" in result, "Response should mention London or temperature"


def test_agent_recovers_from_tool_error():
    """Agent should produce a useful response even when a tool fails."""
    mock_registry = MockToolRegistry()
    mock_registry.set_error(
        "get_weather",
        "API rate limit exceeded. Retry after 60 seconds."
    )

    implementations = {
        "get_weather": mock_registry.get_implementation("get_weather"),
    }

    # The agent should not crash. It should tell the user the tool failed.
    result = run_agent_with_limits(
        user_message="What is the weather in London?",
        tools=registry.get_tools(),
        implementations=implementations,
    )

    assert result, "Agent should produce some response even after tool failure"
    # The model should have received the error and reported it to the user


def test_agent_respects_iteration_limit():
    """Agent should raise AgentLimitError if stuck in a loop."""
    # A tool that always returns a result that makes the model call it again
    call_count = [0]
    def loopy_tool(**kwargs) -> str:
        call_count[0] += 1
        return '{"status": "incomplete", "call_again": true}'

    with pytest.raises(AgentLimitError):
        run_agent_with_limits(
            user_message="Keep using the search tool until you find the answer.",
            tools=[{
                "name": "loopy_tool",
                "description": "A tool that always says to call it again.",
                "input_schema": {"type": "object", "properties": {}, "required": []},
            }],
            implementations={"loopy_tool": loopy_tool},
            limits=AgentLimits(max_iterations=3),
        )

Part 10 -- Complete Working Example

Putting it all together: a research agent that can search the web, read files, and execute code.

import anthropic
import json

def build_research_agent() -> tuple[list[dict], dict[str, callable]]:
    """Build a research agent with web search and code execution capabilities."""

    agent_registry = ToolRegistry()

    @agent_registry.tool()
    def search_web(query: str, num_results: int = 5) -> str:
        """Search the web for current information and news.

        Use this when you need recent information not in your training data,
        or when the user asks about current events, prices, or status.

        :param query: Search query. Be specific for better results.
        :param num_results: Number of search results to return (1-10).
        """
        # Production: use Brave Search API, Bing, or SerpAPI
        # Stub for demonstration
        return json.dumps({
            "query": query,
            "results": [
                {
                    "title": f"Article about {query}",
                    "snippet": f"Relevant information about {query} from a credible source.",
                    "url": "https://example.com/article",
                    "published": "2026-03-01",
                }
            ]
        })

    @agent_registry.tool()
    def run_calculation(code: str) -> str:
        """Execute Python code for calculations and data analysis.

        Use this for mathematical calculations, data processing, or
        any computation that is easier to do with code than describe in words.

        :param code: Python code to execute. Must print the result to stdout.
        """
        # Use the execute_python tool from earlier
        return execute_python(code, timeout_seconds=10)

    @agent_registry.tool()
    def extract_numbers(text: str) -> str:
        """Extract all numbers from a text string.

        Use this to parse numeric values from web search results or
        other text when you need to perform calculations.

        :param text: Text to extract numbers from.
        """
        import re
        numbers = re.findall(r"-?\d+(?:\.\d+)?", text)
        return json.dumps({
            "numbers_found": [float(n) for n in numbers],
            "count": len(numbers),
        })

    return agent_registry.get_tools(), agent_registry.get_implementations()


def run_research_session(question: str) -> str:
    """Run a research session to answer a complex question."""
    tools, implementations = build_research_agent()

    result = run_agent_with_limits(
        user_message=question,
        tools=tools,
        implementations=implementations,
        limits=AgentLimits(
            max_iterations=8,
            max_wall_time_seconds=30.0,
            max_tool_calls=15,
        ),
        model="claude-opus-4-6",
    )

    return result


if __name__ == "__main__":
    answer = run_research_session(
        "What is the approximate compound annual growth rate of Python's "
        "popularity index on TIOBE from 2015 to 2025? Use web search to "
        "find the relevant numbers, then calculate the CAGR."
    )
    print(answer)

Key Takeaways

Tool use is a multi-turn protocol: the model requests a tool call; your Python code executes it; you report the result; the model continues. One user question may require several API calls.
The agentic loop needs safety limits: always set max_iterations, max_wall_time, and max_tool_calls. Without limits, a confused model can loop indefinitely and generate unbounded API costs.
Never crash on tool failure: wrap every tool call in execute_tool_safely. Send the error description back to the model and let it decide how to respond. The model is surprisingly good at recovering from tool errors.
The ToolRegistry pattern keeps schemas and implementations co-located. Schema is derived from type hints and docstrings, so documentation and the schema stay in sync automatically.
Parallel tool calls are supported by both APIs. Execute them concurrently with asyncio.gather to minimise wall-clock latency.
Tool descriptions are prompt engineering: a vague description leads to wrong tool choices. Be explicit about when to use each tool, what format the input should be in, and what the output represents.
Security is your responsibility: the model controls the arguments your tools receive. Validate all inputs, enforce read-only modes on databases, restrict file access to allowed directories, and cap code execution time.
Test at the behaviour level: verify that the agent calls the right tools, recovers from errors, and respects limits. Mock both the LLM and the tools for fast unit tests; run real API calls only in integration tests.

Practice Problems

Problem 1: Schema Validator

The ToolRegistry._build_schema method does not handle list[str], dict[str, Any], or Optional[int] type hints. Extend it to:

Handle list[str] as {"type": "array", "items": {"type": "string"}}
Handle dict[str, Any] as {"type": "object"}
Handle Optional[X] (same as X | None) correctly -- the parameter should not be in required
Handle Literal["a", "b", "c"] as {"type": "string", "enum": ["a", "b", "c"]}

Write the extended _build_schema method and include at least 5 test cases covering these new types.

Problem 2: Tool Call Logger

Build a ToolCallLogger wrapper class that:

Wraps any existing tool implementation
Records: tool name, arguments, result, execution time (milliseconds), whether it was an error
Stores records in memory with a configurable max size (circular buffer)
Exposes a summary() method showing: total calls per tool, average execution time per tool, error rate per tool
Can be used transparently: implementations["get_weather"] = logger.wrap("get_weather", original_fn)

Problem 3: Retry with Backoff

Some tools fail transiently (rate limits, network errors). Build a RetryingToolExecutor that:

Wraps execute_tool_safely
Retries a tool up to N times on transient errors (detect transient errors by looking for "rate limit", "timeout", "503", "429" in the error message)
Uses exponential backoff: wait 1s, 2s, 4s between retries
Never retries logic errors (wrong arguments, access denied)
Adds metadata to the result: {"retries": 2, "final_result": ...}

Problem 4: Streaming Agent Loop

The current run_agent returns only after all tool calls are complete. For user-facing applications, you want to stream intermediate status. Build a generator-based agent loop:

def stream_agent_events(
    user_message: str, tools, implementations
) -> Iterator[dict]:
    # Yield events like:
    # {"type": "tool_start", "name": "get_weather", "args": {...}}
    # {"type": "tool_result", "name": "get_weather", "result": "..."}
    # {"type": "text_delta", "text": "Based on the weather..."}
    # {"type": "done", "final_text": "..."}

This allows a frontend to show "Searching the web..." spinners and stream the final answer token by token.

Problem 5: Multi-Agent Orchestration

Build a simple multi-agent system where a "Planner" agent breaks a complex question into sub-tasks, and "Worker" agents (each with different tool sets) execute the sub-tasks in parallel.

The Planner should:

Receive a complex user question
Break it into 2-4 independent sub-tasks using a create_plan tool
Return a JSON list of subtasks

The Orchestrator should:

Run the Planner to get the task list
Dispatch each subtask to a Worker agent with the relevant tools
Collect Worker results
Run a "Synthesiser" agent that combines all results into a final answer

Design the data flow and implement at least the Planner + Orchestrator. You do not need real external tools -- stubs are fine.

What You Will Learn​

Prerequisites​

Part 1 -- How Tool Use Works​

Why This Design?​

Part 2 -- Defining Tools​

Raw JSON Schema​

Pydantic-Based Tool Schemas​

Part 3 -- The Agentic Loop​

Safe Tool Execution​

Part 4 -- Anthropic Tool Use: Full Protocol​

Handling Multiple Tool Calls in One Turn​

Part 5 -- OpenAI Function Calling​

Parallel Tool Calls​

Part 6 -- Building a ToolRegistry​

Part 7 -- Real Tool Implementations​

Code Execution Tool​

Database Query Tool​

File Operations Tool​

Part 8 -- Timeout and Safety Limits​

Part 9 -- Testing Tool-Using Agents​

Part 10 -- Complete Working Example​

Key Takeaways​

Practice Problems​

Problem 1: Schema Validator​

Problem 2: Tool Call Logger​

Problem 3: Retry with Backoff​

Problem 4: Streaming Agent Loop​

Problem 5: Multi-Agent Orchestration​